Intonational Boundaries, Speech Repairs and Discourse Markers: Modeling Spoken Dialog
نویسندگان
چکیده
To understand a speaker's turn of a conversation, one needs to segment it into intonational phrases, clean up any speech repairs that might have occurred, and identify discourse markers. In this paper, we argue that these problems must be resolved together, and that they must be resolved early in the processing stream. We put forward a statistical language model that resolves these problems, does POS tagging, and can be used as the language model of a speech recognizer. We find that by accounting for the interactions between these tasks that the performance on each task improves, as does POS tagging and perplexity. 1 I n t r o d u c t i o n Interactive spoken dialog provides many new challenges for natural language understanding systems. One of the most critical challenges is simply determining the speaker's intended utterances: both segmenting the speaker's turn into utterances and determining the intended words in each utterance. Since there is no well-agreed to definition of what an utterance is, we instead focus on intonational phrases (Silverman et al., 1992), which end with an acoustically signaled boundary lone. Even assuming perfect word recognition, the problem of determining the intended words is complicated due to the occurrence of speech repairs, which occur where the speaker goes back and changes (or repeats) something she just said. The words that are replaced or repeated are no longer part of the intended utterance, and so need to be identified. The following example, from the Trains corpus (Heeman and Allen, 1995), gives an example of a speech repair with the words that the speaker intends to be replaced marked by reparandum, the words that are the intended replacement marked as alteration, and the cue phrases and filled pauses that tend to occur in between marked as the editing term. E x a m p l e 1 (d92a-5.2 u t t 3 4 ) we'll pick up ~ . uh the tanker of oranges reparandu "q'ml ~ • ~ • editing term alteration interruption point Much work has been done on both detecting boundary tones (e.g. (Wang and Hirschberg, 1992; Wightman and Ostendorf, 1994; Stolcke and Shriberg, 1996a; Kompe et al., 1994; Mast et al., 1996)) and on speech repair detection and correction (e.g. (Hindle, 1983; Bear, Dowding, and Shriberg, 1992; Nakatani and Hirschberg, 1994; Heeman and Allen, 1994; Stolcke and Shriberg, 1996b)). This work has focused on one of the issues in isolation of the other. However, these two issues are intertwined. Cues such as the presence of silence, final syllable lengthening, and presence of filled pauses tend to mark both events. Even the presence of word correspondences, a tradition cue for detecting and correcting speech repairs, sometimes marks boundary tones as well, as illustrated by the following example where the intonational phrase boundary is marked with the ToBI symbol %. E x a m p l e 2 (d93-83.3 u t t 73 ) that's all you need % you only need one boxcar Intonational phrases and speech repairs also interact with the identification of discourse markers. Discourse markers (Schiffrin, 1987; Hirschberg and Litman, 1993; Byron and Heeman, 1997) are used to relate new speech to the current discourse state. Lexical items that can function as discourse markers, such as "well" and "okay," are ambiguous as to whether they are being used as discourse markers or not. The complication is that discourse markers tend to be used to introduce a new utterance, or can be an utterance all to themselves (such as the acknowledgment "okay" or "alright"), or can be used as part of the editing term of a speech repair, or to begin the alteration. Hence, the problem of identifying discourse markers also needs to be addressed with the segmentation and speech repair problems. These three phenomena of spoken dialog, however, cannot be resolved without recourse to syntactic information. Speech repairs, for example, are often
منابع مشابه
Speech Repairs, Intonational Boundaries and Discourse Markers: Modeling Speakers' Utterances in Spoken Dialog
Interactive spoken dialog provides many new challenges for natural language understanding systems. One of the most critical challenges is simply determining the speaker’s intended utterances: both segmenting a speaker’s turn into utterances and determining the intended words in each utterance. Even assuming perfect word recognition, the latter problem is complicated by the occurrence of speech ...
متن کاملSpeech Repairs, Intonational Phrases and Discourse Markers: Modeling Speakers' Utterances in Spoken Dialog
Interactive spoken dialogue provides many new challenges for natural language understanding systems. One of the most critical challenges is simply determining the speaker’s intended utterances: both segmenting a speaker’s turn into utterances and determining the intended words in each utterance. Even assuming perfect word recognition, the latter problem is complicated by the occurrence of speec...
متن کاملSpeech Repairs, Intonational Phrases, And Discourse Markers: Modeling Speakers' Utterances In Spoken Dialogue
Interactive spoken dialogue provides many new challenges for natural language understanding systems. One of the most critical challenges is simply determining the speaker's intended utterances: both segmenting a speaker's turn into utterances and determining the intended words in each utterance. Even assuming perfect word recognition, the latter problem is complicated by the occurrence of speec...
متن کاملIdentifying Discourse Markers in Spoken Dialog
In this paper, we present a method for identifying discourse marker usage in spontaneous speech based on machine learning. Discourse markers are denoted by special POS tags, and thus the process of POS tagging can be used to identify discourse markers. By incorporating POS tagging into language modeling, discourse markers can be identified during speech recognition, in which the timeliness of t...
متن کاملModeling spontaneous speech events during recognition
In spontaneous speech, speakers segment their speech into intonational phrases, and make repairs to what they are saying. However, techniques for understanding spontaneous speech tend to treat these events as noise, in the same manner as they handle out-of-grammar constructions and misrecognitions. In our approach, we advocate that these events should be explicitly modeled. We modify the speech...
متن کامل